Audio-Visual Speaker Recognition for Video Broadcast News

نویسندگان

Benoît Maison

Chalapathy Neti

Andrew W. Senior

چکیده

Signi cant progress has been made in the transcription of the audio stream in the broadcast news domain for both radio news and TV news (HUB4 task). Such transcripts provide an excellent means of indexing video content for search and retrieval. Speaker identi cation is an important technology in this domain both for selecting high-accuracy speaker-dependent models for transcription and as an index for search and retrieval of video content. However, the transcription accuracy under acoustically degraded conditions (such as background noise) and channel mismatch (telephone) still needs further improvements. To make improvements in such degraded conditions is a hard problem. We have begun investigating the combination of audiobased processing with visual processing for both speech and speaker recognition to improve the accuracy in acoustically degraded conditions. The use of two independent sources of information brings signi cantly increased robustness to signal degradation since degradations in the two channels are uncorrelated, and the use of visual information allows a much faster speaker identi cation than possible with acoustic information. In this paper, we present some encouraging preliminary results for audio-visual speaker recognition for TV broadcast news data (CNN).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Audio-visual speaker recognition for video broadcast news: some fusion techniques

Audio-based speaker identi cation degrades severely when there is a mismatch between training and test conditions either due to channel or noise. In this paper, we explore various techniques to fuse video based speaker identi cation with audio-based speaker identi cation to improve the performance under mismatched conditions. Speci cally, we explore techniques to optimally determine the relativ...

متن کامل

UCBN: A new audio-visual broadcast news corpus for multimodal speaker verification studies

The performance of face, voice, and multimodal speaker verification systems in complex and non-controlled scenarios, is typically lower than systems developed in highly controlled environments. With the aim to facilitate the development of robust multi-modal speaker recognition systems, a new multi-modal (audio-visual) Australian broadcast UCBN (University of Canberra Broadcast News) corpus was...

متن کامل

Detecting News Reporting Using Audio/Visual Information

This paper proposes an integrated approach to discriminate news reporting from everything else in broadcast news data based on both audio and visual information. The separation of news reporting segments from others not only can provide useful indices for video streams but also serves as a pre-processing step for tasks such as speaker identi cation and speech recognition so that only speech seg...

متن کامل

Various Methods for Visual Speaker Identification for Automatic Continuous Speech Recognition in TV Broadcast Programs

This paper is about different methods and algorithms that were used for speaker identification from the video recordings of TV broadcast news transcription. The information from visual speaker identification were used in our complex system for automatic continuous speech recognition of TV broadcast programs because it is possible to use speaker adapted (SA) Hidden Markov Models (HMMs) if we hav...

متن کامل

Information Access using Speech, Speaker and Face Recognition

We describe a scheme to combine the results of audio and face identification for multimedia indexing and retrieval. Audio analysis consists of speech and speaker recognition derived from broadcast news video clip. The video component is analyzed to identify the persons in the same video clip using face recognition. When applied individually both speaker and face recognition schemes have limitat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

VLSI Signal Processing

دوره 29 شماره

صفحات -

تاریخ انتشار 2001

Audio-Visual Speaker Recognition for Video Broadcast News

نویسندگان

چکیده

منابع مشابه

Audio-visual speaker recognition for video broadcast news: some fusion techniques

UCBN: A new audio-visual broadcast news corpus for multimodal speaker verification studies

Detecting News Reporting Using Audio/Visual Information

Various Methods for Visual Speaker Identification for Automatic Continuous Speech Recognition in TV Broadcast Programs

Information Access using Speech, Speaker and Face Recognition

عنوان ژورنال:

اشتراک گذاری